Skip to content

OCPBUGS-83580: Load fuse kernel module before testing /dev/fuse in CRI-O#31044

Open
Chandan9112 wants to merge 1 commit intoopenshift:mainfrom
Chandan9112:fix-devfuse-modprobe-fuse
Open

OCPBUGS-83580: Load fuse kernel module before testing /dev/fuse in CRI-O#31044
Chandan9112 wants to merge 1 commit intoopenshift:mainfrom
Chandan9112:fix-devfuse-modprobe-fuse

Conversation

@Chandan9112
Copy link
Copy Markdown
Contributor

@Chandan9112 Chandan9112 commented Apr 21, 2026

Bug

OCPBUGS-83580

Follow-up PR

#PR

Problem

The OCP-70987 test (Allow dev fuse by default in CRI-O) was failing across multiple CI platforms (AWS, GCP, vSphere, Metal+IPv6) with error: stat: cannot statx '/dev/fuse': No such file or directory

The test relies on the CRI-O annotation io.kubernetes.cri-o.Devices: "/dev/fuse" to mount /dev/fuse into the pod. However, this annotation is only supported by the crun runtime. When the default CRI-O runtime is runc, it does not include io.kubernetes.cri-o.Devices in its allowed_annotations, so CRI-O silently ignores the annotation and the pod starts without /dev/fuse mounted.

Root Cause

The test did not account for clusters using runc as the default CRI-O runtime. The /dev/fuse device annotation is a crun-specific feature, and running the test on runc nodes will always fail.

Fix

Added a runtime check before test execution: query crio config on a worker node to detect the default runtime
Skip the test with a clear message when runc is the default runtime, since the /dev/fuse device annotation is not applicable
Replaced the previous modprobe fuse workaround with this more targeted skip

Testing

Tested on a fresh OCP 4.22 GCP cluster (nightly 4.22.0-0.nightly-2026-04-26-170224):

crun (default): Test passed — /dev/fuse successfully mounted inside the pod
runc (via ContainerRuntimeConfig): Test correctly skipped with message: Skipping: not applicable to runc runtime

➜  origin git:(fix-devfuse-modprobe-fuse) ./openshift-tests run-test "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] Allow dev fuse by default in CRI-O [OCP-70987] [Suite:openshift/conformance/parallel]"

  Running Suite:  - /Users/cmaurya/go/src/github.com/openshift/origin
  ===================================================================
  Random Seed: 1777308957 - will randomize all specs

  Will run 1 of 1 specs
  ------------------------------
  [sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] Allow dev fuse by default in CRI-O [OCP-70987]
  github.com/openshift/origin/test/extended/node/node_e2e/node.go:111
    STEP: Creating a kubernetes client @ 04/27/26 22:26:01.05
  I0427 22:26:01.051091    5408 discovery.go:214] Invalidating discovery information
  I0427 22:26:01.056870 5408 framework.go:2330] [precondition-check] checking if cluster is MicroShift
  I0427 22:26:01.427179 5408 framework.go:2353] IsMicroShiftCluster: microshift-version configmap not found, not MicroShift
    STEP: Check if the default CRI-O runtime is runc @ 04/27/26 22:26:01.427
    STEP: Create a test namespace @ 04/27/26 22:26:14.175
  namespace/devfuse-test created
    STEP: Create a pod with dev fuse annotation @ 04/27/26 22:26:14.891
  pod/pod-devfuse created
    STEP: Wait for pod to be ready @ 04/27/26 22:26:16.326
    STEP: Check /dev/fuse is mounted inside the pod @ 04/27/26 22:26:22.043
  I0427 22:26:23.463019 5408 node.go:152] /dev/fuse mount output: File: /dev/fuse
    Size: 0             Blocks: 0          IO Block: 4096   character special file
  Device: 1000feh/1048830d      Inode: 6           Links: 1     Device type: a,e5
  Access: (0666/crw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)
  Access: 2026-04-27 16:56:18.967382166 +0000
  Modify: 2026-04-27 16:56:18.967382166 +0000
  Change: 2026-04-27 16:56:18.968382164 +0000
   Birth: 2026-04-27 16:56:18.967382166 +0000
  namespace "devfuse-test" deleted
  • [67.736 seconds]
  ------------------------------

  Ran 1 of 1 Specs in 67.737 seconds
  SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped

**Test skipping on crun as runtime**

    STEP: Check if the default CRI-O runtime is runc @ 04/27/26 22:41:22.744
    [SKIPPED] in [It] - github.com/openshift/origin/test/extended/node/node_e2e/node.go:122 @ 04/27/26 22:41:28.02
  S [SKIPPED] [5.703 seconds]
  [sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [It] [OTP] Allow dev fuse by default in CRI-O [OCP-70987]
  github.com/openshift/origin/test/extended/node/node_e2e/node.go:111

    [SKIPPED] Skipping: not applicable to runc runtime
    In [It] at: github.com/openshift/origin/test/extended/node/node_e2e/node.go:122 @ 04/27/26 22:41:28.02
  ------------------------------

  Ran 0 of 1 Specs in 5.704 seconds
  SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 1 Skipped

Summary by CodeRabbit

  • Tests
    • Strengthened dev-fuse e2e test preconditions: confirm at least one worker node exists and check the container runtime before proceeding, so the test fails or skips early if conditions aren't met.
    • Adjusted namespace setup flow to ensure precondition checks run before any pod or namespace creation.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@Chandan9112: This pull request references Jira Issue OCPBUGS-83580, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (cmaurya@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Bug

OCPBUGS-83580

Follow-up PR

#PR

Problem

The OCP-70987 test (Allow dev fuse by default in CRI-O) was failing across multiple CI platforms (AWS, GCP, vSphere, Metal+IPv6) with error:
stat: cannot statx '/dev/fuse': No such file or directory

The test relies on the CRI-O annotation io.kubernetes.cri-o.Devices: "/dev/fuse" to mount /dev/fuse into the pod. However, on CI nodes where the fuse kernel module is not loaded, /dev/fuse does not exist on the host. CRI-O silently ignores the annotation when the device is missing, so the pod starts successfully but without /dev/fuse mounted.

Root Cause

The fuse kernel module is not loaded on some CI cluster nodes. Without it, /dev/fuse doesn't exist on the host, and CRI-O cannot bind-mount it into the container.

Fix

Added a modprobe fuse step on a worker node before creating the test pod. This ensures the fuse kernel module is loaded so /dev/fuse is available on the host for CRI-O to mount into the container.

  • modprobe fuse is idempotent — if the module is already loaded, it's a no-op

Testing

Tested on a fresh OCP 4.22 GCP cluster:

➜  origin git:(fix-devfuse-modprobe-fuse) ./openshift-tests run-test "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] Allow dev fuse by default in CRI-O [OCP-70987] [Suite:openshift/conformance/parallel]"

 Running Suite:  - /Users/cmaurya/go/src/github.com/openshift/origin
 ===================================================================
 Random Seed: 1776750632 - will randomize all specs

 Will run 1 of 1 specs
 ------------------------------
 [sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] Allow dev fuse by default in CRI-O [OCP-70987]
 github.com/openshift/origin/test/extended/node/node_e2e/node.go:111
   STEP: Creating a kubernetes client @ 04/21/26 11:20:36.666
 I0421 11:20:36.668426   56742 discovery.go:214] Invalidating discovery information
 I0421 11:20:36.670670 56742 framework.go:2330] [precondition-check] checking if cluster is MicroShift
 I0421 11:20:36.968916 56742 framework.go:2353] IsMicroShiftCluster: microshift-version configmap not found, not MicroShift
   STEP: Ensure fuse kernel module is loaded on a worker node @ 04/21/26 11:20:36.969
   STEP: Create a test namespace @ 04/21/26 11:20:41.146
 namespace/devfuse-test created
   STEP: Create a pod with dev fuse annotation @ 04/21/26 11:20:41.846
 pod/pod-devfuse created
   STEP: Wait for pod to be ready @ 04/21/26 11:20:43.223
   STEP: Check /dev/fuse is mounted inside the pod @ 04/21/26 11:20:48.935
 I0421 11:20:50.331485 56742 node.go:149] /dev/fuse mount output: File: /dev/fuse
   Size: 0             Blocks: 0          IO Block: 4096   character special file
 Device: 1000c4h/1048772d      Inode: 6           Links: 1     Device type: a,e5
 Access: (0666/crw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)
 Access: 2026-04-21 05:50:44.558306685 +0000
 Modify: 2026-04-21 05:50:44.558306685 +0000
 Change: 2026-04-21 05:50:44.558306685 +0000
  Birth: 2026-04-21 05:50:44.558306685 +0000
 namespace "devfuse-test" deleted
 • [58.837 seconds]
 ------------------------------
 Ran 1 of 1 Specs in 58.838 seconds
 SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from BhargaviGudi and mrunalp April 21, 2026 06:06
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Chandan9112
Once this PR has been reviewed and has the lgtm label, please assign cpmeadors for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Walkthrough

The dev-fuse e2e test now selects worker nodes, asserts at least one exists, queries CRI-O’s default_runtime on a worker and skips the test if it contains runc, and reuses the existing err variable during namespace creation.

Changes

Cohort / File(s) Summary
Dev-Fuse E2E Test
test/extended/node/node_e2e/node.go
List worker nodes by label and assert non-empty; exec a remote command on a worker to read CRI-O default_runtime and skip the test if it includes runc; adjust namespace creation to assign into the existing err (err = ...).

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner
    participant KubernetesAPI
    participant WorkerNode
    participant CRIO

    TestRunner->>KubernetesAPI: List nodes (label=worker)
    KubernetesAPI-->>TestRunner: Node list
    TestRunner->>WorkerNode: Exec command to read CRI-O config (e.g., cat /etc/crio/crio.conf)
    WorkerNode->>CRIO: Access CRI-O config
    CRIO-->>WorkerNode: default_runtime value
    WorkerNode-->>TestRunner: command output
    alt default_runtime contains "runc"
        TestRunner->>TestRunner: Skip test
    else
        TestRunner->>TestRunner: Create namespace (reuse err variable)
        TestRunner->>WorkerNode: modprobe fuse (for each worker)
        TestRunner->>TestRunner: Continue test setup
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Test lacks the critical modprobe fuse precondition step required to fix the /dev/fuse flake on non-runc workers, and has inconsistent assertion messages. Add modprobe fuse step on worker node before namespace creation and add meaningful failure messages to all assertions lacking them.
Topology-Aware Scheduling Compatibility ⚠️ Warning Test code uses exclusive worker node label selector, failing on SNO/TNF/TNA topologies where no dedicated worker nodes exist. Implement topology-aware node selection with fallback to control-plane nodes or skip test on SNO/TNF/TNA topologies.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: loading the fuse kernel module before testing /dev/fuse in CRI-O, which directly addresses the root cause documented in the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test maintains stable and deterministic names with static title and step descriptions, no dynamic values in test identifiers.
Microshift Test Compatibility ✅ Passed The test is protected from running on MicroShift by an existing BeforeEach guard in its parent Describe block that checks exutil.IsMicroShiftCluster() and skips all tests when on MicroShift.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Test operates on single worker node only; no multi-node, HA, failover, or topology constraints; compatible with SNO.
Ote Binary Stdout Contract ✅ Passed File test/extended/node/node_e2e/node.go contains only test suite definitions with no process-level stdout writes or logging violations.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Test uses only cluster-internal Kubernetes service DNS and dynamically discovered node names via label selectors with no external connectivity requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Copy Markdown

@Chandan9112: This pull request references Jira Issue OCPBUGS-83580, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (cmaurya@redhat.com), skipping review request.

Details

In response to this:

Bug

OCPBUGS-83580

Follow-up PR

#PR

Problem

The OCP-70987 test (Allow dev fuse by default in CRI-O) was failing across multiple CI platforms (AWS, GCP, vSphere, Metal+IPv6) with error:
stat: cannot statx '/dev/fuse': No such file or directory

The test relies on the CRI-O annotation io.kubernetes.cri-o.Devices: "/dev/fuse" to mount /dev/fuse into the pod. However, on CI nodes where the fuse kernel module is not loaded, /dev/fuse does not exist on the host. CRI-O silently ignores the annotation when the device is missing, so the pod starts successfully but without /dev/fuse mounted.

Root Cause

The fuse kernel module is not loaded on some CI cluster nodes. Without it, /dev/fuse doesn't exist on the host, and CRI-O cannot bind-mount it into the container.

Fix

Added a modprobe fuse step on a worker node before creating the test pod. This ensures the fuse kernel module is loaded so /dev/fuse is available on the host for CRI-O to mount into the container.

  • modprobe fuse is idempotent — if the module is already loaded, it's a no-op

Testing

Tested on a fresh OCP 4.22 GCP cluster:

➜  origin git:(fix-devfuse-modprobe-fuse) ./openshift-tests run-test "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] Allow dev fuse by default in CRI-O [OCP-70987] [Suite:openshift/conformance/parallel]"

 Running Suite:  - /Users/cmaurya/go/src/github.com/openshift/origin
 ===================================================================
 Random Seed: 1776750632 - will randomize all specs

 Will run 1 of 1 specs
 ------------------------------
 [sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] Allow dev fuse by default in CRI-O [OCP-70987]
 github.com/openshift/origin/test/extended/node/node_e2e/node.go:111
   STEP: Creating a kubernetes client @ 04/21/26 11:20:36.666
 I0421 11:20:36.668426   56742 discovery.go:214] Invalidating discovery information
 I0421 11:20:36.670670 56742 framework.go:2330] [precondition-check] checking if cluster is MicroShift
 I0421 11:20:36.968916 56742 framework.go:2353] IsMicroShiftCluster: microshift-version configmap not found, not MicroShift
   STEP: Ensure fuse kernel module is loaded on a worker node @ 04/21/26 11:20:36.969
   STEP: Create a test namespace @ 04/21/26 11:20:41.146
 namespace/devfuse-test created
   STEP: Create a pod with dev fuse annotation @ 04/21/26 11:20:41.846
 pod/pod-devfuse created
   STEP: Wait for pod to be ready @ 04/21/26 11:20:43.223
   STEP: Check /dev/fuse is mounted inside the pod @ 04/21/26 11:20:48.935
 I0421 11:20:50.331485 56742 node.go:149] /dev/fuse mount output: File: /dev/fuse
   Size: 0             Blocks: 0          IO Block: 4096   character special file
 Device: 1000c4h/1048772d      Inode: 6           Links: 1     Device type: a,e5
 Access: (0666/crw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)
 Access: 2026-04-21 05:50:44.558306685 +0000
 Modify: 2026-04-21 05:50:44.558306685 +0000
 Change: 2026-04-21 05:50:44.558306685 +0000
  Birth: 2026-04-21 05:50:44.558306685 +0000
 namespace "devfuse-test" deleted
 • [58.837 seconds]
 ------------------------------
 Ran 1 of 1 Specs in 58.838 seconds
 SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped

Summary by CodeRabbit

  • Tests
  • Enhanced dev-fuse e2e test robustness by adding a precondition step to verify worker node availability and load required kernel module before test execution.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 115-120: The current logic only loads the fuse module on the first
worker fetched (jsonpath {.items[0]}) which fails when the test pod schedules to
another worker; change the node discovery to fetch all worker names (e.g.,
jsonpath {.items[*].metadata.name}), split the output into individual node
names, then loop over each node and call nodeutils.ExecOnNodeWithChroot(oc,
strings.TrimSpace(nodeName), "modprobe", "fuse"), checking
o.Expect(err).NotTo(o.HaveOccurred()) for each invocation so every schedulable
worker has /dev/fuse loaded before the pod runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 2913005d-20d0-4383-82e0-aba7fc8eb55e

📥 Commits

Reviewing files that changed from the base of the PR and between ff48d7a and aefb892.

📒 Files selected for processing (1)
  • test/extended/node/node_e2e/node.go

Comment thread test/extended/node/node_e2e/node.go Outdated
@Chandan9112 Chandan9112 force-pushed the fix-devfuse-modprobe-fuse branch from aefb892 to 28fc03c Compare April 21, 2026 06:22
@Chandan9112
Copy link
Copy Markdown
Contributor Author

/retest

@joepvd
Copy link
Copy Markdown
Contributor

joepvd commented Apr 21, 2026

/test images

1 similar comment
@fgallott
Copy link
Copy Markdown

/test images

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@Chandan9112
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@Chandan9112: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-microshift 28fc03c link true /test e2e-aws-ovn-microshift
ci/prow/e2e-aws-ovn-microshift-serial 28fc03c link true /test e2e-aws-ovn-microshift-serial

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@BhargaviGudi
Copy link
Copy Markdown
Contributor

/payload 4.22 nightly blocking

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@BhargaviGudi: trigger 13 job(s) of type blocking for the nightly release of OCP 4.22

  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-main-ci-4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-1of2
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-2of2
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-3of3
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/38210840-3e0a-11f1-8a06-a9e051c3da43-0

@Chandan9112
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented Apr 22, 2026

Job Failure Risk Analysis for sha: 28fc03c

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift IncompleteTests
Tests for this run (29) are below the historical average (1427): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial IncompleteTests
Tests for this run (29) are below the historical average (728): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Comment thread test/extended/node/node_e2e/node.go Outdated
workers := strings.Fields(nodeNames)
o.Expect(workers).NotTo(o.BeEmpty(), "No worker nodes found")
for _, worker := range workers {
_, err = nodeutils.ExecOnNodeWithChroot(oc, worker, "modprobe", "fuse")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line changes the state of the node permanently by adding fuse. So all tests following this is impacted.
So we either need to unload fuse if its loaded here after the test case or run this test on a new node and destroy it.
You could try modprobe -r fuse in a defer, but this could fail if something is using it

@Chandan9112 Chandan9112 force-pushed the fix-devfuse-modprobe-fuse branch from 28fc03c to 1ca4217 Compare April 27, 2026 17:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 115-123: The runtime-based skip removed the prior host-side
fuse-module precondition and can allow tests to proceed without /dev/fuse
present; restore the fuse check or add an explicit /dev/fuse verification on the
worker before continuing. Specifically, before the runtime check that uses
workerName and nodeutils.ExecOnNodeWithChroot, either reinsert the
modprobe/ensure-fuse step that was originally run on the host or run a command
via nodeutils.ExecOnNodeWithChroot (using strings.TrimSpace(workerName)) to test
for the presence of /dev/fuse (e.g., test -e /dev/fuse) and fail/skip if
missing; ensure this check executes unconditionally prior to creating pods so
the original fuse-module precondition is enforced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 351c77c3-95a9-4ec3-ae03-a1d06e555048

📥 Commits

Reviewing files that changed from the base of the PR and between 28fc03c and 1ca4217.

📒 Files selected for processing (1)
  • test/extended/node/node_e2e/node.go

Comment on lines +115 to +123
g.By("Check if the default CRI-O runtime is runc")
workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found")
runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'")
o.Expect(err).NotTo(o.HaveOccurred())
if strings.Contains(runtime, "runc") {
g.Skip("Skipping: not applicable to runc runtime")
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The runtime skip doesn’t replace the fuse-module precondition.

This still goes straight to pod setup without any host-side modprobe fuse check, so the original /dev/fuse flake can still happen on non-runc workers. Please keep the fuse-module step before this logic, or verify /dev/fuse exists on the worker before continuing.

Suggested fix
 g.By("Check if the default CRI-O runtime is runc")
 workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output()
 o.Expect(err).NotTo(o.HaveOccurred())
 o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found")
 runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'")
 o.Expect(err).NotTo(o.HaveOccurred())
 if strings.Contains(runtime, "runc") {
 	g.Skip("Skipping: not applicable to runc runtime")
 }
+
+g.By("Ensure fuse kernel module is loaded before creating the pod")
+_, err = nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "modprobe fuse")
+o.Expect(err).NotTo(o.HaveOccurred())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
g.By("Check if the default CRI-O runtime is runc")
workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found")
runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'")
o.Expect(err).NotTo(o.HaveOccurred())
if strings.Contains(runtime, "runc") {
g.Skip("Skipping: not applicable to runc runtime")
}
g.By("Check if the default CRI-O runtime is runc")
workerName, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("nodes", "-l", "node-role.kubernetes.io/worker", "-o=jsonpath={.items[0].metadata.name}").Output()
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(workerName).NotTo(o.BeEmpty(), "No worker nodes found")
runtime, err := nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "crio config 2>/dev/null | grep 'default_runtime'")
o.Expect(err).NotTo(o.HaveOccurred())
if strings.Contains(runtime, "runc") {
g.Skip("Skipping: not applicable to runc runtime")
}
g.By("Ensure fuse kernel module is loaded before creating the pod")
_, err = nodeutils.ExecOnNodeWithChroot(oc, strings.TrimSpace(workerName), "/bin/bash", "-c", "modprobe fuse")
o.Expect(err).NotTo(o.HaveOccurred())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_e2e/node.go` around lines 115 - 123, The
runtime-based skip removed the prior host-side fuse-module precondition and can
allow tests to proceed without /dev/fuse present; restore the fuse check or add
an explicit /dev/fuse verification on the worker before continuing.
Specifically, before the runtime check that uses workerName and
nodeutils.ExecOnNodeWithChroot, either reinsert the modprobe/ensure-fuse step
that was originally run on the host or run a command via
nodeutils.ExecOnNodeWithChroot (using strings.TrimSpace(workerName)) to test for
the presence of /dev/fuse (e.g., test -e /dev/fuse) and fail/skip if missing;
ensure this check executes unconditionally prior to creating pods so the
original fuse-module precondition is enforced.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All failures were on clusters using runc as the default CRI-O runtime, which does not include io.kubernetes.cri-o.Devices in its allowed_annotations. On crun clusters, /dev/fuse has always been available and the test has never failed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chandan9112, thank you for the detailed clarification. That makes sense — since io.kubernetes.cri-o.Devices is not listed in allowed_annotations for runc-based clusters, CRI-O would silently ignore the device annotation regardless of whether /dev/fuse exists on the host, making the test meaningless and prone to false positives/failures in that environment. Skipping early on runc is the correct fix. I'll withdraw my concern.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants